Analytical Mean Squared Error Curves in Temporal Di erence Learning

نویسندگان

  • Satinder Singh
  • Peter Dayan
چکیده

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal di erence value estimation algorithms change with o ine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize and eligibility trace parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analytical Mean Squared Error Curves in Temporal Difference Learning

Peter Dayan Brain and Cognitive Sciences E25-210, MIT Cambridge, MA 02139 [email protected] We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve...

متن کامل

Analytical Mean Squared Error Curves in Temporal Diierence Learning

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal diierence value estimation algorithms change with ooine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its step-...

متن کامل

An Analysis of Temporal Di erence Learning with Function Approximation

We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...

متن کامل

An Analysis of Temporal - Di erence Learning with Function Approximation 1

We discuss the temporal-di erence learning algorithm, as applied to approximating the cost-to-go function of an in nite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator on{line, during a single endless trajectory of an irreducible aperiodic Markov chain with a nite or in nite state space. We present a proof of convergence (with proba...

متن کامل

Regularization Learning of Neural Networks for Generalization

In this paper, we propose a learning method of neural networks based on the regularization method and analyze its generalization capability. In learning from examples, training samples are independently drawn from some unknown probability distribution. The goal of learning is minimizing the expected risk for future test samples, which are also drawn from the same distribution. The problem can b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997